Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lift clauses in LogicalAst #2449

Merged
merged 1 commit into from
Aug 14, 2024
Merged

lift clauses in LogicalAst #2449

merged 1 commit into from
Aug 14, 2024

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Jul 5, 2024

(a OR b) OR (c OR d) can be simplified to (a OR b OR c OR d)
(a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d)

This directly affects how queries are executed and the performance

Remove SumWithCoordsCombiner, it is effectively the same as SumCombiner, but with an unused field

simplify_ast

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it works for all Occur

// If clauses below have the same `Occur`, we can pull them up
match simplified_sub_ast {
LogicalAst::Clause(sub_clauses)
if sub_clauses.iter().all(|(o, _)| *o == occur) =>
Copy link
Collaborator

@fulmicoton fulmicoton Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Occur is Not?
Can we stay on the safe side and only apply this logic for occur = Must, Should, or Filter?

Also this will only yield the same score if we sum scores of child branches.
(I think this is the case today: we don't rely on dismax do we?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good catch, I excluded it and added a test. There are more optimization we can do here to flatten clauses later on

(a OR b) OR (c OR d) can be simplified to (a OR b OR c OR d)
(a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d)

This directly affects how queries are executed

remove unused SumWithCoordsCombiner
the number of fields is unused and private
Copy link
Contributor

@trinity-1686a trinity-1686a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe other simplifications are possible (see below), this one seems correct

  • top: Should, inner: single Must => Should
  • top: Must, inner: single Should => Must
  • top: MustNot, inner: all MustNot => all Should impact scoring (or we need to wrap with a boost of 0)
  • top: MustNot, inner: single Must|Should => MustNot
  • top: Must, inner: all MustNot => all MustNot

@fulmicoton
Copy link
Collaborator

@trinity-1686a Can you open a ticket and auto-assign for this?

@PSeitz PSeitz merged commit 27be6ae into main Aug 14, 2024
4 checks passed
@PSeitz PSeitz deleted the simplify_ast branch August 14, 2024 17:21
philippemnoel pushed a commit to paradedb/tantivy that referenced this pull request Aug 31, 2024
(a OR b) OR (c OR d) can be simplified to (a OR b OR c OR d)
(a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d)

This directly affects how queries are executed

remove unused SumWithCoordsCombiner
the number of fields is unused and private
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants